{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Transformer大家族\n",
    "\n",
    "## 1. Transformer结构\n",
    "\n",
    "Transformer结构最初就是在大2017年名鼎鼎的《Attention Is All You Need》论文中提出的，最开始是用于机器翻译任务。\n",
    "\n",
    "这里先简单回顾一下Transformer的基本结构：\n",
    "\n",
    "<img src='https://huggingface.co/course/static/chapter1/transformers_blocks.png' width=200 align=\"center\">\n",
    "\n",
    "- 左边是encoder，用于对输入的sequence进行表示，得到一个很好特征向量。\n",
    "- 右边是decoder，利用encoder得到的特征，以及原始的输入，进行新的sequence的生成。\n",
    "\n",
    "encoder、decoder既可以单独使用，又可以再一起使用，因此，基于Transformer的模型可以分为三大类：\n",
    "\n",
    "- Encoder-only\n",
    "- Decoder-only\n",
    "- Encoder-Decoder\n",
    "\n",
    "\n",
    "## 2. Transformer家族\n",
    "\n",
    "随后各种基于Transformer结构的模型就如雨后春笋般涌现出来，教程中有一张图展示了一些主要模型的时间轴：\n",
    "\n",
    "<img src='https://huggingface.co/course/static/chapter1/transformers_chrono.png' width=1000>\n",
    "\n",
    "虽然模型多到四只jio都数不过来，但总体上可以分为三个阵营，分别有三个组长：\n",
    "\n",
    "- 组长1：**BERT**。组员都是BERT类似的结构，是一类**自编码模型**。\n",
    "- 组长2：**GPT**。组员都是类似GPT的结构，是一类**自回归模型**。\n",
    "- 组长3：**BART/T5**。组员结构都差不多是**encoder-decoder**模型。\n",
    "\n",
    "### 不同的架构，不同的预训练方式，不同的特长\n",
    "\n",
    "对于**Encoder-only**的模型，预训练任务通常是“破坏一个句子，然后让模型去预测或填补”。例如BERT中使用的就是两个预训练任务就是**Masked language modeling**和**Next sentence prediction**。\n",
    "因此，这类模型擅长进行文本表示，适用于做**文本的分类、实体识别、关键信息抽取**等任务。\n",
    "\n",
    "对于**Decoder-only**的模型，预训练任务通常是**Next word prediction**，这种方式又被称为**Causal language modeling**。这个Causal就是“因果”的意思，对于decoder，它在训练时是无法看到全文的，只能看到前面的信息。\n",
    "因此这类模型适合做**文本生成**任务。\n",
    "\n",
    "而**Seq2seq**架构，由于包含了encoder和decoder，所以预训练的目标通常是融合了各自的目标，但通常还会设计一些更加复杂的目标，比如对于T5模型，会把一句话中一片区域的词都mask掉，然后让模型去预测。seq2seq架构的模型，就适合做**翻译、对话**等需要根据给定输入来生成输出的任务，这跟decoder-only的模型还是有很大差别的。\n",
    "\n",
    "### 总结表如下："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "|类型|架构|Transformer组件 |\tExamples |\tTasks|\n",
    "|--|---|--- |\t--- |\t---|\n",
    "|**BERT**-like | auto-encoding models|\tEncoder  |\t\tALBERT, BERT, DistilBERT, ELECTRA, RoBERTa | \tSentence classification, named entity recognition, extractive question answering|\n",
    "|**GPT**-like |  auto-regressive models |\tDecoder  |\t\tCTRL, GPT, GPT-2, Transformer XL |\t \tText generation|\n",
    "|**BART/T5**-like |  sequence-to-sequence models|\tEncoder-decoder  |\t\tBART, T5, Marian, mBART |\t \tSummarization, translation, generative question answering|\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "本部分对应的官方链接：\n",
    "https://huggingface.co/course/chapter1/4?fw=pt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
